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0.  Summary 

This  is  an  expository  paper  which  seeks  to  establish  and  show  the 
value  of  the  following  assertion:  the  concept  of  a  time  series  is 
equivalent  to  the  idea  of  a  probability  measure  on  a  function  space. 

1.  What  is  a  time  series? 

The  general  point  of  view  adopted  in  analyzing  a  time  Beries  or  a 
succession  of  observations  [X(t),  t  e  T),  depending  on  the  parameter 
t  (often  representing  time),  is  the  following. 

A  set  T  of  values  of  t,  called  the  index  set  of  the  time  series, 
is  preassigned;  these  are  the  times  when  observations  are  possible.  The 
set  T  may  be  finite  or  infinite.  It  is  possible  to  develop  much  of 
the  theory  of  time  series  without  placing  any  restriction  on  the  nature 
of  the  index  set  T.  However  two  important  cases  are  when 


T  =  (0,  +1,  +2,  ...)  or  T  =  {0,  1,  2,  .. .)  , 


*To  be  presented  at  the  196?  Joint  Automatic  Control  Conference 
Workshop  on  "Stochastic  Processes",  to  be  held  at  the  University  of 
Minnesota  during  June  of  1963 .  This  paper  was  prepared  with  the  partial 
support  of  the  Office  of  Naval  Research  (Nonr-225-21) .  Reproduction  in 
whole  01  in  part  is  permitted  for  any  purpose  of  the  United  States 
Government.  This  paper  is  based  on  material  to  appear  in  a  forthcoming 
book  [Parzen  (1964)]. 

**It  is  with  great  pleasure  that  I  dedicate  this  paper  to  Professor 
Charles  Loewner  on  his  70th  birthday. 


In  which  cue  the  tins  series  is  said  to  be  s  discrete  psr—eter  process, 
or  when 


T  -  (t:  -»  <  t  <  »}  or  T  -  (t:  t  >  0)  , 

in  which  case  the  time  series  is  said  to  be  a  continuous  par  meter 
process . 

At  each  point  t  in  T,  a  number  X(t)  may  be  observed.  This 
number  is  a  random  variable  in  the  sense  that  its  value  depends  on  chance 
and  enjoys  a  probability  distribution  described  by  the  one-dimensional 
distribution  function 

FX(t)  =  ^WbiHty[X(t)  <  x]  ,  -»  <  x  <  »  , 

or  the  one-dimensional  characteristic  function 

exp[iux]  ^x(t)  ^  *  -»  <  u  <  »  . 

More  generally,  for  any  integer  n  and  n  points  t^,  tg,  . . .  ,  t  in 
T,  the  n  observations  X(t^),  ...  ,  X(t^)  which  can  be  made  at  these 
times  are  jointly  distributed  random  variables  whose  Joint  probability 
law  is  specified  by  either  (i)  the  Joint  distribution  function,  given 
for  all  real  nunbers  x^,  ...  ,  xn  by 

rX(tl),...,X(tn)  <*1 . V 

(1.1) 

=  Probability! X(tx)  <  x^  X(tg)  <  ...  ,  X(tn)  <  x^ 


^x(t)  (u) 
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or  (ii)  the  Joint  characteristic  function,  given  for  ill  reel  maker* 
...  ,  uft  by 


(1.2) 


^X(t^), . .  *,X(tn)  (V  •••  '  un5 

=  E[exp  +  ...  +  unX(tQ)^] 

=  f  ...  f  ex p  ifujXj^  +  ...  +  unxR) 

J  -30  d  —30 


^XC^), . .  .,X(tn)  <V  •••  '  Xn)  * 

The  distribution  function  in  Equation  (l.l)  and  the  characteristic 
function  in  Equation  (1.2)  are  Baid  to  be  n-dimensional,  since  they 
represent  the  Joint  probability  law  of  n  random  variables . 

The  point  of  view  embodied  in  the  foregoing  discussion  may  be 
summarized  as  follows. 

A  time  series  is  a  Jointly  distributed  family  of  random  variables 
(X(t),  t  €  T),  indexed  by  a  parameter  t  varying  in  an  index  set  T. 

Time  series  analysis  is  concerned  with  the  statements  that  can  be  made 
about  a  time  series,  knowing  only  all  the  finite  dimensional  distributions 
of  the  form  of  (l.l). 

It  should  be  noted  that  the  phrase  "a  stochastic  process"  is  often 
used  to  describe  a  numerical- valued  random  phenomenon  that  arises  through 
a  process  which  is  developing  in  time  in  a  manner  controlled  by  probabilistic 
laws  [see  Parzen  (1962)].  Mathematically,  a  stochastic  process  is  repre¬ 
sented  by  a  collection  of  random  variables  (X(t),  t  c  T).  Thus,  in  a 
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sense;  the  notions  of  a  tine  series  and  of  a  stochastic  process  are 
equivalent;  every  tine  series  is  a  stochastic  process  and  vice  versa. 

What  distinguishes  the  theory  of  tine  series  from  the  theory  of 
stochastic  processes  is  a  certain  difference  in  emphasis. 

One  approach  to  the  problem  of  developing  mathematical  models  for 
empirical  phenomena  evolving  in  accord  with  probabilistic  lavs  is  to 
characterize  such  phenomena  in  terms  of  the  behavior  of  their  first  and 
second  moments.  This  approach  has  found  important  applications  in  statis¬ 
tical  communications  and  control  theory  and  in  time  series  analysis. 

When  a  stochastic  process  is  being  studied  in  terms  of  its  moments  it  is 
often  called  a  time  series.  Consequently  one  would  make  the  following 
definition . 

A  time  series  (X(t),  t  e  T]  is  a  family  of  random  variables  (stochas¬ 
tic  process)  with  finite  second  moments. 

It  is  to  be  emphasized  that  it  should  not  be  assumed  that  a  stochastic 
process  which  arises  in  practice  necessarily  has  finite  second  moments 
and  is  therefore  a  time  series.  In  particular,  recent  research  (by  Mandel¬ 
brot]  has  raised  the  question  of  whether  certain  series  of  economic 
observations  involving  price  changes  of  stocks  and  commodities  possess 
finite  second  moments . 

Two  important  characteristics  of  a  time  series  (X(t),  t  e  T)  are 
its  mean  value  function  m( • ),  defined  for  all  t  in  T  by 

m(t)  =  E[X(t)]  , 

and  its  covariance  kernel  K(  • .  • ) ,  defined  for  all  s  and  t  in  T 


k 


by* 


K(s,t)  -  Cov[X(s),X(t)]  . 

The  importance  of  the  mean  value  function  and  the  covariance  kernel 
derives  from  several  facts: 

(l)  it  is  usually  much  easier  to  find  the  mean  value  function  and 
the  covariance  kernel  of  a  stochastic  process  than  it  is  to  find  its 
complete  probability  law; 

(ii)  nevertheless,  many  important  questions  about  a  stochastic 
process  can  be  answered  on  a  basis  of  a  knowledge  only  of  its  mean  value 
function  and  covariance  kernel; 

(iii)  for  example,  the  continuity,  differentiability  and  integrability 
properties  of  the  covariance  kernel  lead  to  corresponding  properties  for 
the  time  series; 

(iv)  further,  there  exists  an  important  class  of  time  series,  the 
normal  stochastic  processes,  whose  complete  probability  law  is  known  once 
one  knows  its  mean  value  function  and  covariance  kernel. 

Normal  processes:  A  stochastic  process  (X(t ) ,  t  e  T}  is  said  to 
be  a  normal  process  if  for  any  integer  n  and  any  subset  (t^,  V...,tn} 
of  T  the  n  random  variables  X(t^),  ...  ,  X(t^)  are  Jointly  normally 

*In  studying  the  general  theory  of  time  series,  it  is  often  convenient 
to  admit  complex  valued  random  variables.  The  covariance  kernel  is  then 
defined  by 

K(s,t)  =  E[X(s)X(t)]  -  m(s)m(t) 

where  X(t)  denotes  the  complex  conjugate  of  X(t)  and  m(t)  denotes 
the  complex  conjugate  of  m(t). 
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distributed  in  the  sense  that  their  Joint  characteristic  function  is 
given  by,  for  any  real  numbers  u^,  Ug,  ...  ,  uQ, 


<pX(t1),X(t2),...,X(tn)  (W 


E[exp  i(u1X(t1)+. ..+unX(tn)}] 


Uj  E[X(tj)]  -  §  Oo-vt ) . X(tk) j} 


«(tj) 


,  11 

s 


j,k-l 


Normal  processes  play  a  basic  role  xn  time  series  analysis  for  a 
number  of  reasons: 

(i)  Because  of  the  central  limit  theorem,  many  random  variables 

which  arise  in  applications  of  probability  theory  may  be  considered  to  be 

approximately  normally  distributed}  similarly,  many  stochastic  processes 

can  be  approximated  by  normal  processes} 

(ii)  Because  of  the  mathematical  tractibility  of  normal  random 

variables,  many  questions  can  be  more  simply  treated  for  normal  processes 

than  for  other  kinds  of  time  series} 

(iii)  Normal  processes  have  the  useful  closure  property  that  any  time 
t 

series  [such  as  /  X(s)  ds,  X'(t),  X(t  +  1)  -  X(t)]  derived  by  means 
0 

of  linear  operations  on  a  normal  process,  is  itself  a  normal  process} 

(lv)  For  a  normal  process,  one  obtains  a  knowledge  of  the  complete 
probability  law  of  the  process  from  a  knowledge  of  the  mean  value  function 
m(*)  and  the  covariance  kernel  K( • , • ) •  Conversely  it  may  be  shown 
that  if  m( • )  and  K( • , • )  are  the  mean  value  function  and  covariance 
kernel  of  some  time  series,  then  there  is  a  (unique)  normal  process  with 

this  mean  value  function  and  covariance  kernel. 
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2.  A  tine  series  as  a  probability  measure  on  function  gge». 


A  tine  series  {X(t),  t  e  T)  is  for  many  purposes  best  ragalrdad  es 
an  observation  on  a  random  phenomenon  each  of  whose  possible  outcomes  is 
a  real  valued  function  with  domain  T.  In  other  words,  a  time  series 
(X(t) ,  t  e  T}  is  a  collection  of  real  valued  functions  with  domain  T, 
one  of  which  is  observed  whenever  a  sample  is  taken.  The  observed  func¬ 
tion  is  therefore  called  a  sample  function,  or  realization,  of  the  time 
series . 

Given  an  index  set  T,  we  let  denote  the  space  of  all  real 

valued  functions  with  domain  T.  A  point  u  belonging  to  ftT  is  a 
function  on  T  whose  value  at  a  point  t  in  T  we  denote  by  u(t)i 
consequently,  we  may  write  o>  =  (co(t ) ,  t  e  T). 

The  problem  of  analyzing  a  time  series  (X(t),  t  e  T)  can  be 

expressed  as  the  problem  of  finding  the  probability  function  P[  •  ] , 

defined  on  suitable  subsets  of  ft  called  the  measurable  subsets, 

T 

which  describes  the  probability  distribution  of  possible  values  of  the 
time  serieB  (in  the  intuitive  sense  that  for  any  measurable  subset  A 
of  flj,  P(A]  is  approximately  the  relative  frequency  of  observations 
in  a  very  long  sequence  of  independent  observations  of  the  time  series 
which  are  members  of  A.) 

In  this  Bection  we  show  how  given  the  finite  dimensional  probability 
laws  of  a  time  series  one  can  construct  a  probability  measure  on  a 
suitable  family  of  subsets  of  the  function  space  ft^.  This  probability 
measure  will  enable  us  to  define  the  notion  of  the  probability  density 
functional  of  a  time  series  which  plays  a  central  role  in  modern  time 
series  analysis  [see  Parzen  (1962,  1963)]. 
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Let  d  be  a  collection  of  sublets  of  which  contains  Qg  as  a 
member .  A  function  P,  with  domain  d,  is  said  to  ha  a  probability 
measure  if  it  possesses  the  following  properties: 

Axiom  1.  For  every  A  6  d,  P[ A]  ie  well  defined  and  is  a  non¬ 
negative  real  number;  in  symbols , 

f; 

P[A]  >  0  i 


Axiom  2.  P[fl^p]  *  1; 

Axiom  3  •  For  any  sequence  of  disjoint  sets  A^,  A^,  . . .  belonging 
to  CL,  whose  union  An  belongs  to  Cl, 

n»l 


H\J  A  ]  -  2  P[A„]  . 
n«l  n*l 


the  sets  (A  }  are  said  to  be  disjoint  (or  non-overlapping)  if  for  any 
n 

two  distinct  indices  J  and  k  the  intersection  of  A^  and 
empty, 


AA  “ i 


where  0  denotes  the  empty  'set. 

Axiom  3  is  referred  to  as  the  countable  additivity  or  sigma-additivity 
property  of  the  probability  function  P. 

The  sets  belonging  to  CL  are  called  measurable  sets  or  events.  An 
event  A  is  said  to  occur  if  the  function  representing  the  actual  time 
series  observed  belongs  to  A. 
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In  order  to  guarantee  that  the  usual  operations  of  analysis  will 
lead  to  measurable  sets,  it  is  necessary  to  require  that  the  family  of 
measurable  sets  be  a  sigma-field. 

A  collection  CL  of  subsets  of  is  called  a  sigma-field  if  it 
has  the  following  properties: 

Axiom  1.  nT  belongs  to  d  (written  symbolically:  e  CL)-, 

Axiom  2 .  If  A  belongs  to  CL  then  the  complement  Ac  belongs 
to  Ci  (written  symbolically:  A  e  &  implies  AC  €  CL  ); 


Axiom  3.  For  any  sequence  A^,  ...  belonging  to  CL,  the 

00 

union  \^J  A^  belongs  to  &  (written  symbolically:  (An)  Cl 
n*l 

» 

implies  A  e  CL) . 

n=l  n 

In  words,  a  sigma-field  is  a  family  of  setB  which  contains  the  entire 
space  fyj,  and  is  closed  under  the  operations  of  forming  complements  and 
countable  unions.  It  then  follows  that  it  is  closed  under  the  operation 
of  forming  countable  intersections: 


Property  1*-.  For  any  sequence  A^,  ...  belonging  to  0-,  the 

intersection  /)  A  e  Ci, 

;  n 
n=l 

An  important  example  of  a  sigma-field  is  the  family  of  all  subsets 


of  flT. 

The  question  naturally  arises:  cannot  all  subsets  of  be  made 
events  so  that  it  is  never  necessary  to  consider  a  sigma-field  a 
smaller  than  the  family  of  all  subsets  of  Unfortunately  the  general 

answer  to  this  question  is  in  the  negative;  if  we  desire  the  probability 
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function  P  to  be  countably  additive,  it  is  usually  tbs  ease  that  tbs 
family  (L  of  events  cannot  contain  all  subsets  of  Rp.  For  example, 
even  in  the  simple  case  that  T  consists  of  a  single  point  so  that  flp 
is  just  the  real  line  (the  set  of  all  real  numbers  w  satisfying 
-»  <  (>)  <  oo ) ,  there  is  no  probability  .function  P  defined  on  all  subsets 
of  (Ip  that  agrees  with  the  ordinary  notion  of  length  on  the  subintervals 
of  Rp  (see  Halmos  (1950),  p.  TO).  Usually,  when  Rp  is  the  real  line, 
one  adopts  as  the  family  of  events  the  family  B  of  Borel  sets,  where 
Q  is  defined  as  the  smallest  sigma-field  containing  as  members  all 
intervals . 

The  idea  that,  for  each  t,  X(t)  is  a  random  variable  can  now  be 
made  precise  by  the  following  definition:  for  each  t  in  T,  X(t)  is 
a  function  on  flp  whose  value  at  a  point  w  in  flp,  denoted  by  X(t,w), 
is  given  by  the  value  at  t  of  the  function  u: 

(2.1)  X(t,w)  =  u>(t)  . 

In  order  to  regard  X(t)  as  a  random  variable,  there  must  exist 

(i)  a  family  CL  of  subsets  of  Rp,  called  the  measurable  sets, 
such  that  for  all  t  in  T  and  all  real  numbers  x 

(2.2)  (u:  X(t,u>)  <  x)  e  Q.  , 

and  (ii)  a  probability  measure  P  with  domain  CL . 

A  function  X  with  domain  Rp  is  said  to  be  ^-measurable  if  for 
every  real  number  x 
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(u:  X(w)  <  x)  €  d. 


The  problem  at  hand  is  to  find  a  sigma- field  of  subsets  of  0^  such  tint 
each  random  variable  X(t)  is  ^2-measurable. 

In  the  function  space  there  is  a  smallest  sigma-field  of 

events  which  should  belong  to  the  sigma-field  of  measurable  setB.  Let 
denote  the  smallest  sigma-field  of  subsets  of  £1^,  which  contains 
all  sets  of  the  form 


{(j):  u>(t)  <  x) 

where  t  is  a  point  in  T  and  x  is  a  real  number.  It  is  clear  that, 
for  every  t  in  T,  X(t)  is  ^Immeasurable. 

Consequently,  given  a  probability  measure  P  on  a,  ,  one  can 
define  by  (2.1)  a  time  series  (X(t),  t  e  Tl  consisting  of  random  variables 
with  domain  nT  and  the  finite  dimensional  probability  distributions  of 
the  time  series  would  be  given  by  the  formula 

FX(t1),...,X(tn)  'V  =  Probability^^)  <x1,...,X(tn)  <xn] 

(2.3) 

=  P[(u  e  (y  u(tj)  <  Xj  for  J  =  1,  2,  . . .  ,  n)]  . 

Basic  to  the  theory  of  time  series  analysis  is  the  fact  that  the  converse 
holds  (which  follows  from  a  celebrated  theorem  proved  by  Kolmogorov  (1933). 

Kolmogorov 1 s  celebrated  existence  theorem  regarding  the  probability 
measure  on  function  space  £JT  Induced  by  a  stochastic  process  (X(t),  t  «  T) . 
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Let  (X(t),  tel)  be  a  stochastic  process  vlth  preassigned  finite 
dimensional  probability  distributions.  Then  there  exists  a  unique 
probability  measure  on  the  sigma-field  0^  satisfying  (2.J) . 

To  prove  Kolmogorov's  theorem,  one  considers  a  somewhat  more  general 
problem. 

Let  T  be  an  index  set.  Given  a  family  of  finite  dimensional 
characteristic  functions, 


(2.4) 


; 


n  an  integer  and  t^,  ...  ,  tR  points  in  T)  , 


what  conditions  need  this  family  satisfy  in  order  that  there  exist  a 
stochastic  process  (X(t),  t  €  T)  whose  true  finite  dimensional  charac¬ 
teristic  functions  coincide  with  the  given  set  (2.4): 


(2*5)  <PX(t1),...,X(tn)  (V  •••  »  un}  *  (V 


u  ) 
n' 


In  view  of  (2-5),  it  is  obvious  that  the  given  set  (2.4)  must  be  mutually 
consistent  in  the  sense  that  (i)  if  a  ,  ...  ,  an  is  a  permutation  of 
1,  ...  ,  n,  then  for  any  pointB  t^,  ...  ,  t  and  real  numbers 

v  •••  '  V 


(2.6)  <p,  .  (u^ ,  ...  ,  Uj  )  *  q>.  .  (u.,  ...  ,  u  )  , 

V’“'an  ^  n  V*'vn  ^  n 

since  the  order  in  which  the  random  variables  are  listed  is  irrelevant. 
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and  (il)  if  a  <  n, 


(2.7)  q>+ 


(«!» 

2& 


■'V*. 


n 


The  content  of  Kolmogorov's  theorem  is  that  these  consistency  conditions 
are  the  only  conditions  that  need  be  Imposed. 

In  order  to  give  a  precise  statement  of  Kolmogorov's  theorem  we 
introduce  the  notion  of  a  semi -infinite  interval. 

A  subset  C  of  flT  is  called  a  semi-infinite  interval  if  it  is  of 
the  form 


(2.8)  C--  [u«  ftT:  «(tx)  <  ...  ,  u>(tn)  <  xn) 

for  some  integer  n,  points  t^,  ...  ,  tQ  belonging  to  T,  and  real 
numbers  x^,  ...  ,  .  Note  that  to  specify  a  semi-infinite  rectangle 
one  must  specify  an  integer  n,  n  points  t^,  . . .  ,  tQ  in  the  index 
set  T,  and  n  real  numbers  x^,  ...  ,  x^. 

Example  2A. 

A  semi-infinite  interval.  Let  T  =  [0,1]  so  that  0^,  consists  of 
all  functions  on  the  interval  0  to  1.  Let  n  -  11,  t.  =  j/10  for 
J  -  0,  1,  . . .  ,  10,  and  Xj  =  j/2  for  J  ■  0,  1,  . . .  ,  10.  Then 

C  -  (u  e  0(0,1):  «(J/10)  <  j/2  for  J  -  0,  1,  ...  ,  10}  . 

Consider  the  following  functions  defined  on  the  interval  0  to  1: 
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fx(t)  -  t  +  1 
f2(t)  -  t2  , 
f^(t)  -  sin  5t  . 

It  nay  be  shown  that  f^(*)  does  not  belong  to  C,  while  f2(-)  8114 

f_(  • )  do  belong  to  C. 

3 

Let  P  be  a  function  defined  on  semi-infinite  intervals  in  flT  as 
follows:  for  C  given  by  (2.8) 

(2.9)  MO]  -  t  (v  •••  >  *„> 

1'  n 

where  F.  .  (x. ,  . . .  ,  x  )  is  the  distribution  function  corresponding 

*1'  n  1  n 

to  the  characteristic  function  <p.  .  (x. ,  ...  ,  x  ). 

*  ",Zn  1 

Kolmogorov's  theorem:  There  exists  a  unique  probability  measure  P 
on  2-t  whose  values  on  semi-infinite  intervals  satisfies  (2.9).  Further, 
define  a  family  of  functions  (X(t),  t  e  T)  on  as  follows:  for 
each  ui  e  flT,  the  value  of  X(t)  at  w,  denoted  X(t,u>) ,  is  given  by 

X(t,w)  =  u(t)  ,  the  value  at  t  of  the  function  u>  . 

Then  (X(t),  t  e  T)  is  a  stochastic  process,  defined  on  the  probability 
space  (fyj,,  0^, P)  whose  finite  dimensional  characteristic  functions 
satisfy  (2.5). 

The  proof  of  Kolmogorov's  theorem  requires  a  background  in  measure 
theory  which  is  beyond  the  scope  of  this  paper  [for  a  proof,  see  Kolmogorov 
(1933),  P-  29,  or  Lofcve  (i960),  p.  93]- 
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2B. 


An  application  of  Kolmogorov's  tb»oren.  Consider  a  tine  aerie* 

(X(t),  t  e  T)  defined  by  the  fomula 

(2.10)  X(t)  -  tj  <p(t)  ,  t  e  T  , 

where  tj  is  a  random  variable  (with  finite  second  moment)  and  <p(  • )  la 
a  non-random  function  (for  example,  <p(t )  ■  t). 

The  statement  that  tj  Is  a  random  variable  is  not  completely  explicit; 
we  also  want  to  know  the  space  0  on  which  tj  is  defined  as  a  function. 

The  random  variables  X(t)  defined  by  (2.10)  are  functions  on  the 
same  space  0  as  is  tj.  Since  this  space  may  be  unknown,  it  is  sometimes 
useful  to  redefine  X(t)  as  follows. 

Given  the  probability  distribution  of  tj,  the  family  of  random 
variables  (X(t),  t  c  T)  induces  a  probability  measure  P  on  by 

means  of  the  formula 

P[{u>  e  0T:  u^)  <  x1,  ...  ,  w(tn)  <  xn)) 

(2.11)  *  Prob[X(t1)  <  Xj_,  ...  ,  X(tQ)  <  xQ] 

-  Prob[Tj  <p(tx)  <  Xp,  ...  ,  tj  q>(tQ)  <  xn]  . 

Next,  the  random  variables  (X(t),  t  e  T)  can  be  redefined  to  be  functions 
on  (Vp  by  the  formula 
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(2.12) 


X(t,w)  -  w(t)  . 


The  new  family  of  random  variables  (X(t),  t  e  T)  obtained  from  the 
definition  (2.12)  can  be  identified  with  the  old  family  obtained  from 
definition  (2.10)  since  they  have  the  same  finite  dimensional  distribu¬ 
tions  . 

By  the  procedure  just  described,  a  time  series  (X(t),  t  e  T) 
defined  by  an  explicit  formula  such  as  (2.10)  can  be  regarded  as  being 
equivalent  to  a  probability  measure  P  on  the  function  apace  0^,. 

In  particular,  suppose  that  the  random  variable  q  is  normal  with 
mean  u  and  variance  a  •  Then  X(*)>  defined  by  (2.10)  iB  a  normal 
process  with  mean  value  function 

(2.13)  m(t)  «  E[X(t)]  =  n  q>(t)  , 


and  covariance  kernel 


(2.14)  K(s,t)  =  c2  qp(s)  <p(t)  . 

Any  normal  process  (X(t),  t  e  T)  with  the  foregoing  mean  value  function 
and  covariance  kernel  induces  the  same  probability  measure  P  on  function 
space  Oj,. 

Using  the  representation  theory  of  time  series  (Parzen  (1961),  p.  962), 
it  may  be  shown  that  a  time  series  (X(t),  t  e  T)  with  mean  value  function 
and  covariance  kernel  given  by  (2.13)  and  (2.14)  respectively  may  be 
represented  in  the  form  of  (2.10),  where  q  is  now  a  random  variable 
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vfiose  domain  la  the  function  space  0^.  Consequently  vben  considering 
time  series  of  the  form  of  (2.10)  one  may  assume  that  all  random  variables 
are  functions  on 
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}.  Probability  density  functional*  and  orthogonal  measures. 

The  probability  theory  of  time  seriea  la  concerned  with  investigating 
the  structure  of  a  time  series  (X(t),  t  e  T}  whose  corresponding 
probability  measure  P  on  the  function  Bpace  ft^  is  assumed  known.  The 
statistical  theory  of  time  series  is  concerned  with  a  time  series 
{X(t),  t  e  T)  whose  probability  measure  P  is  not  known  exactly  but  is 
only  known  to  belong  to  a  class  of  probability  measures  {Pg,  0  6  4}  on 
the  class  of  possible  probability  measures  PQ  can  be  assumed  to 
be  indexed  by  a  parameter  0  varying  in  a  parameter  Bet  4 .  Consequently 
an  important  step  in  developing  a  theory  of  time  series  analysis  is  to 
examine  the  relations  that  can  exist  between  two  probability  measures 
P^  and  P2  with  a  common  domain  (L 

Absolutely  continuous  and  orthogonal  probability  measures.  Let  ft 
be  a  set  (the  sample  description  space)  and  let  0-  be  a  sigma-field  of 
subsets  of  ft.  Let  P1  and  Pg  be  probability  measures  with  domain 

a. 

We  say  that  P^  is  absolutely  continuous  with  respect  to  Pg  if  for 
every  set  A  in  CL 

(3.1)  Pg(A)  =  0  implies  P^A)  =  0  . 

We  write  <  <  Pg  if  P^  is  absolutely  continuous  with  respect  to 
Pgi  the  motivation  for  this  notation  is  the  idea  that  P^  is  small 
whenever  Pg  is  small. 
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We  say  that  ?1  and  Pg  are  equivalent,  denoted  if 

each  la  absolutely  continuous  with  respect  to  the  other}  In  ayabols, 

* 

P1  *  Pg  If  and  only  If  P1  <  <  Pg  <  <  P^* 

In  order  that  not  be  absolutely  continuous  with  respect  to  Pg 
it  is  necessary  and  sufficient  that  there  exist  a  aet  A  in  CL  such 
that 

(3-2)  Pg(A)  ■  0  and  P-^A)  >  0  . 

We  say  that  P^  and  Pg  are  orthogonal  (or  perpendicular ) ,  denoted 
P.^  J_  Pg,  if  there  exists  a  set  A  in  Q.  such  that 

(3-3)  P2(A)  -0  and  P1(A>  -  1  . 

One  can  regard  (3*2)  as  the  extreme  case  of  not  being  absolutely  con¬ 
tinuous  . 

It  may  be  shown  that  if  P^  <  <  Pg,  then  probability  statements  in 
terms  of  P^  can  be  expressed  in  terms  of  Pg;  more  precisely,  if 
P^  <  <  Pg,  then  there  exists  a  function  p,  called  the  probability 
density  function  of  P^  with  respect  to  Pg,  such  that  for  any  A  in 

a, 

(3.10  PX(A)  -J  P  dPg  . 

In  words,  (3  A)  says  that  to  evaluate  P^(A)  one  integrates  the  function 
p  with  respect  to  the  measure  Pg  over  the  set  A.  More  generally. 
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for  any  -measurable  function  g  which  la  integrable  with  respect  to 


Pl' 


(3-5) 


Vs1  ~L e 


Is  finite,  then  there  holds  the  transformation  formula 


(3-6)  Ig]  -  /  g  P  dP  • 

rl 

Three  problems  preliminary  to  time  series  analysis.  To  develop  a 
general  theory  of  time  series  analysis,  one  must  begin  with  an  under¬ 
standing  of  the  relations  that  can  exist  between  two  probability  measures 
P1  and  P2  on  a  function  space.  Consequently,  we  may  speak  of  three 
problems  preliminary  to  time  series  analysis : 

(i)  determine  whether  two  given  probability  measures  and  Pg 

are  orthogonal, 

(li)  determine  whether  one  is  absolutely  continuous  with  respect  to 
the  other, 

(iii)  if  P2  <  <  P^,  determine  the  probability  density  functional  or 
Radon-Nikodym  derivative,  denoted 

fa 

P2,l  =  dPx  * 

One  aim  of  modern  time  series  analysis  is  to  develop  ways  of  deter¬ 
mining  answers  to  these  questions. 
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4.  Signal  detection  and  likelihood  rating. 


In  thia  section  we  examine  the  problem  of  detecting  a  signal  in 
noise,  and  show  how  the  proper  formulation  of  this  problem  requires  a 
consideration  of  the  relations  that  exist  between  probability  measures. 
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Many  observed  tlae  series  can  be  represented  as  the  sub  of  a  signal 
process  and  a  noise  process.  Mare  precisely,  let  T  be  a  set  of  points, 
called  the  index  set,  such  that  at  each  t  in  T  one  has  nada  (or  one 
can  make)  an  observation,  denoted  X(t).  The  set  of  observations 
(X(t),  t  e  T)  is  a  function  on  T  which  is  assumed  to  be  the  sum  of  two 
other  functions  (S(t),  t  e  T)  and  (N(t),  t  €  T): 

X(t)  =  S(t)  +  N(t)  ,  t  e  T  . 

We  call  (S(t),  tel)  the  signal  since  it  is  supposed  to  represent  the 
true  value  of  the  quantity  being  measured,  while  (N(t),  t  e  T)  is 
called  the  noise  since  it  represents  "errors",  "fluctuations",  or  "residuals 
by  which  the  observed  function  (X(t),  t  e  T)  differs  from  the  desired 
function  (S(t),  t  e  T) . 

The  aim  of  time  series  analysis  is  to  infer,  from  the  observations, 
information  about  one  or  more  features  of  the  signal.  The  aspects  of  the 
signal  in  which  one  is  interested  depends  on  the  assumptions  one  makes 
about  the  structure  of  the  signal  and  noise  processes.  In  this  section 
we  consider  the  important  problem  of  detecting  a  signal  in  the  presence 
of  noise. 

Let  (S(t),  t  e  T)  and  (N(t),  t  e  T)  be  time  series,  called 
respectively  the  signal  process  and  the  noise  process.  Given  an  observed 
time  series  (X(t),  t  e  T),  one  desires  to  test  the  hypothesis 

H^:  X( • )  =  N( • )  ,  noise  alone  is  present  , 
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•gainst  the  alternative  hypothesis 


Kjj  X( • )  -8(-)  +  »(•)  ,  signal  plus  noise  is  present  , 

by  closing  a  subset  ^  of  the  sample  space  of  possible  realisa¬ 

tions  of  the  time  series  (X(t),  t  e  T},  which  will  be  the  rejection 
region  for  H^j  that  is,  one  says  signal  plus  noise  is  present  if  the 
observed  time  series  (X(t),  t  £  l)  belongs  to  R^,  and  one  says  that 
noise  alone  is  present  if  {X(t),  t  e  T)  does  not  belong  to  R^. 

In  this  section,  we  suppose  that  the  probability  distribution  of 
(X(t),  tel)  under  the  hypotheses  Hq  and  are  well  defined;  in 
this  case  we  say  that  HQ  and  are  single  hypotheses.  We  may  then 
introduce  probability  measures  PN  and  Pg+K  defined  on  the  measurable 
subsets  B  of  the  sample  space  by 


(fc.l) 


(h.2) 


Pn[B]  -  Prob  [ (N(t),  t  e  T)  e  B] 

-  Prob  [ (X(t>,  t  e  T)  €  b|hq]  , 
Ps+If[B]  -  Prob  [{S(t)  +  N(t),  t  £  T)  e  B] 
=  Prob  [(X(t),  t  e  T)  €  BlHj  . 


Example  4A. 

A  specified  signal  in  normal  noise.  One  inport  ant  case  in  which 
Hq  and  are  simple  hypotheses  is  when  the  following  assumptions  hold. 
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Assumption  on  the  index  set:  The  index  set  T  is  a  finite  set 

(t^>  tg,  ...  ,  tfl) . 

Assumption  on  the  noise  process:  The  noise  process  (N(t),  t  e  T) 
is  assumed  to  possess  finite  second  moments,  to  have  zero  means: 

(4.3)  E[N(t)]  »  0  , 


and  known  covariance  kernel  K: 


(4.4) 


E[N(s)  N(t)]  *  K(s,t)  . 


Further,  (N(t),  t  e  T)  is  a  normal  process;  that  is,  the  n  random 
variables  N(t^),  ...  ,  N(t^)  are  jointly  normally  distributed  so  that 
their  Joint  characteristic  function  is  given  by 


(4.5) 


(>)N(t1),...,N(tn)  (V  ’  Un} 


=  exp 


•  \  .  I,  "j  K(tJ’tk)  \ 


j,k=l 


Assumption  on  the  signal  process :  The  signal  (S(t),  t  e  T)  is  a 
known  non-random  function.  The  signal  plus  noise  process 
(X(t)  =  S(t)  +  H(t),  t  e  T}  1b  then  a  normal  process  with  mean  value 
function 


(4.6) 


E[X(t) ]  -  S(t) 
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and  covariance  kernel 

(4.7)  Cov[X(s),X(t)]  «  K(s,t)  . 

Under  the  assumptions  of  this  example,  is  the  probability 
measure  on  function  space  induced  by  a  normal  process  with  mean  value 
function  identically  zero  and  covariance  kernel  K,  -while  Pg+jj  is  the 
probability  measure  induced  by  a  normal  process  with  mean  value  function 
equal  to  the  signal  function  S(t)  and  covariance  kernel  K. 

Perfect  detectability  and  the  singular  detection  problem:  We  say 
that  the  hypotheses  and  are  perfectly  detectable,  or  that  the 
problem  of  detecting  the  signal  process  S( • )  in  the  presence  of  the 
noise  process  N(«)  is  singular,  if  there  exists  a  set  A  in  the  sample 
space  flj,  such  that 

(4.8)  Pn[AJ  =  0  ,  Ps+n[A]  =  1  • 

By  choosing  A  as  the  rejection  region  for 

zero  of  incorrectly  identifying  noise  as  signal  plus 

noise  as  noise .  Note  that  the  probability  measures 

orthogonal  if  (4.8)  holds.  By  definition,  then,  the 

and  £.  are  perfectly  detectable  if  and  only  if  P 

N 

orthogonal . 

The  regular  detection  problem:  13ie  problem  of  detecting  the  signal 
process  in  the  presence  of  the  noise  process  N( • )  is  called  regular  if 


one  has  probability 
noise  or  signal  plus 


P»  “d  PS+N  "* 


hypotheses  H 


and  P, 


S+N 


0 

are 
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PN  and  Pg+N  are  not  orthogonal.  In  this  case,  new  principle*  must  be 
Introduced  in  order  to  optimally  choose  the  rejection  region  R^.  First, 
one  distinguishes  two  types  of  errors  that  can  occur: 

(i)  a  false  alarm  (or  error  of  type  1)  occurs  when  one  says  that 
a  signal  plus  noise  is  present  when  in  fact  noise  alone  is  present; 

(ii)  a  detection  failure  (or  error  of  type  II)  occurs  when  one  says 
that  noise  alone  is  present  when  in  fact  signal  plus  noise  is  present. 

A  rejection  region  R  is  then  characterized  by  two  numbers  a  and 
0,  defined  by 


(*•9) 


a  =  Prob  [false  alarm] 


=  Prob  [(X(t),  t  e  T)  e  r|hq]  , 


(4-10) 


0  =  Prob  [detection  failure] 


«  1  -  Prob  [(X(t),  t  e  T]  e  R|H.]  . 


In  certain  cases  it  may  be  possible  to  assign  a  numerical  measure 
to  the  seriousness  of  a  false  alarm  and  of  a  detection  failure;  one  denotes 
these  costs  by*  L  and  Lg  respectively.  Further  it  may  be  possible 
to  determine  the  fraction  Jtg  of  experimental  situations  in  which  signal 
plus  noise  is  present;  one  calls  Jtg  the  prior  probability  that  signal 

*In  the  general  theory  of  hypothesis  testing,  a  false  alarm  is  called 
a  type  I  error,  and  is  the  cost  of  a  type  I  error,  while  a  detection 

failure  is  called  a  type  II  error  and  is  the  cost  of  a  type  II  error. 
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is  present .  To  each  critical  region  one  can  assign  a  risk  p»  called  , 
the  Bayes  risk  and  defined  as  the  expected  cost  of  an  Incorrect  decision! 

0*.ll)  p  =  a(l  -  ns)  ^  1^  . 

The  Bayes  rejection  region  R  (or  optimum  rejection  region  according  to 
the  Bayes  criterion)  is  defined  as  the  region  which  minimizes  p,  the 
expected  cost  of  an  incorrect  decision.** 

It  may  he  difficult  to  use  a  Bayes  rejection  region  for  one  or  both 
of  the  following  reasons: 

(i)  because  of  difficulty  in  assigning  the  losses  and  I^, 

(ii)  because  of  difficulty  in  assigning  the  prior  probability  x_. 

In  these  circumstances  one  may  use  the  Neyman- Pearson  rejection  region 
(or  optimum  rejection  region  according  to  the  Neyman-Pearson  criterion) 
which  is  defined  as  the  rejection  region  R  minimizing  0,  the  detection 
failure  probability,  subject  to  the  restriction  that  a,  the  false  alarm 
probability,  is  less  than  or  equal  to  some  desired  level  Ot^. 

We  next  show  how  one  may  determine  the  Bayes  rejection  region  and 
the  Neyman-Pearson  rejection  region  by  introducing  probability  density 
functionals . 

Let  us  assume  that  there  exists  a  measure  Q  on  the  measurable 

and  functions  and  ps+N  with 
domain  with  the  property  that,  for  every  measurable  subset  B  of 

**Note  that  a  rejection  region  which  minimizes  the  average  probability 
of  error  ng,  =  a(l  -  ng)  +  0ng,  is  the  same  as  the  Bayes  rejection 

region  with  unit  costs,  =  1. 


subsets  of  the  sample  space 
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(4.12) 


V»  -/B  %  « 


(4.W)  WB>  ‘I 

r 

In  order  to  emphasize  that  its  argument  is  a  function  (X(t),  t  e  T), 
we  call  Pjj  a  functional,  and  sometimes  denote  it  by  pN(X(t),  t  e  T). 

It  is  called  the  probability  density  functional  of  with  respect  to 
Q.  The  function  pN  may  be  written  symbolically  as  a  derivative, 

(4.14)  PN  =  , 

and  is  then  called  the  Radon- Nikodym  derivative  of  PN  with  respect  to 
Q.  Similarly,  pg+N  is  called  the  probability  density  functional,  or 
Radon-Nikodym  derivative,  of  Pg+N  with  respect  to  Q. 

In  order  for  the  probability  density  functionals  pN  and  Pg+1j  to 
exist,  there  must  exist  a  measure  Q  with  respect  to  which  both  P^  and 
Pg+N  are  absolutely  continuous;  recall  that  P^  1b  absolutely  continuous 
with  respect  to  Q  if 

(4.15)  Q(A)  =  0  implies  PN(o)  for  a^1-  A  C  . 

Such  a  measure  is  given  by 

(4.16)  Q  -  P„  .  . 
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I 


l 


Consequently  there  Is  no  loss  of  generality  In  aerating  that  there  exist 
functions  and  satisfying  (4.12)  and  (4.13)  respectively. 

In  terns  of  probability  density  functionals,  the  false  slant  proba¬ 
bility  a  and  detection  failure  probability  0  of  a  rejection  region  am 
given  by 


(4.17) 

(4.18) 


a 


0  =  1 


/„*» 
L 


dQ 


PS+N  dQ 


Consequently,  the  Bayes  risk  p  of  a  rejection  region  R  is  given  by 
(4.19)  P  -  Id  -  *s>  ^  PS  -  »s  is  Pe,„)  PQ  *  «s  U,  • 

To  minimize  p,  one  should  chooBe  R  as  the  set  of  observations 

(X(t) ,  t  e  T}  for  which  the  integrand  in  (4.19)  is  negative,  so  that  the 

Bayes  rejection  region  R  is  given  by 


(4.20)  R  =  |(x(t),  t  e  t): 


(1  -  «S> 
KS  L2 


}• 


The  ratio 


(4.21) 


^S+N 

% 


of  probability  density  functionals  is  called  in  classical  statistical 
literature  the  likelihood  ratio  since  p_(X(t),  t  c  T)  is  defined  to  be 
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the  likelihood  that  noise  alone  is  present  given  that  the  observed  tiae 
series  vas  (X(t),  t  e  T),  and  pg+M(X(t),  t  e  T)  is  the  likelihood 
that  signal  plus  noise  is  present  given  that  the  observed  tiae  series 
was  (X(t),  t  e  T} . 

The  likelihood  ratio  (4.21)  has  a  probabilistic  meaning  which  shows 
that  the  measure  Q  used  in  defining  the  likelihood  ratio  plays  no  role. 
Let 


(4.22) 


t  e 


*> 


PS+N  >  0 


and 


If  Pg+^[A]  *  0,  then  Pg+N  is  absolutely  continuous  with  respect  to 
PN,  and  the  Radon-Nikodym  derivative  of  Pg+N  with  respect  to  P^  is 
given  by  the  likelihood  ratio: 


(4.23) 


^S+N  =  PS+N 
PN 


The  Radon-Nikodym  derivative  of  Pg+jj  with  respect  to  Pjj  is  called 
the  probability  density  functional  of  signal  plus  noise  with  respect  to 
noise;  where  no  ambiguity  can  arise  it  is  denoted  by  p.  To  sunrnarlze, 
the  probability  density  functional  p  of  signal  plus  noise  with  respect 
to  noise  is  a  function  on  the  sample  space  satisfying 


(4.24) 


PS+N^ 


i 


P 


b  Cn  , 


and  exists  if  and  only  if  Pg+N  is  absolutely  continuous  with  respect 
to  PN,  which  holds  if  and  only  if  Pg+N[A]  -  0,  where  A  is  defined 
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by  (It  .22);  it  then  follows  that  p  is  given  by  the  likelihood  ratio 


(4.25) 


^S+N 


If  Pg+^[A]  >  0,  instead  of  (4.24),  one  has  for  every  measurable 
subset  B  of  ft, 


(4.26) 


WBl  ”/B  *  +  Wffil  • 


where  p  is  still  given  by  (4.25);  (4.26)  is  an  example  of  the  Lebesgue 
decomposition  theorem  which  states  that  to  the  probability  measures  PN 
and  Pg+N  there  exists  a  function  p  on  ft  and  set  A  Buch  that 
PN[A]  =  0  and  (4.26)  holds.  We  have  made  this  assertion  concrete  by 
showing  how  one  may  find  p  and  A. 


Optimum  detectors:  A  random  variable  U  (that  is,  a  function  on 
the  sample  space  ft)  which  has  the  property  that  the  rejection  region 
which  is  optimum  according  to  a  certain  criterion  may  be  expressed  in 
terms  of  the  values  of  U  is  called  an  optimum  detector  according  to 
that  criterion. 

From  (4.20)  it  follows  that  the  likelihood  ratio  is  an  optimum 
Bayes  detector.  Indeed  the  optimum  rejection  region  according  to  the 
Bayes  criterion  consists  of  all  observations  (X(t) ,  t  €  T)  for  wMch 
the  likelihood  ratio  is  above  a  certain  threshold  value  given  by 


(4.27) 


-  V  h 

“s^ 
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Similarly,  it  may  be  shown  that  the  likelihood  ratio  is  an  optimum 
Neyman-Pe arson  detector.  Indeed  the  optimum  Neyman- Pearson  rejection 
region  according  to  the  Neyman-Pearson  criterion  consists  of  all  observa¬ 
tions  (X(t),  t  e  T]  for  vhich  the  likelihood  ratio  is  above  a  certain 
threshold  A^,  determined  by  the  condition  that  the  false  alarm  probability 
be  equal  to  os^: 


(4.28) 


‘N 


■S+N 

PN 


>  A. 


=  OL 


To  prove  this  assertion,  one  uses  the  fact  that  the  Neyman-Pearson  rejec¬ 
tion  region  R  is  that  region  which  subject  to  the  condition 


(4.29) 


maximizes 


PN  dQ  <  a 


(4.30) 


PS+N 


dQ  . 


Intuitively,  one  sees  that  the  optimum  rejection  region  R  should  contain 
those  sample  points  which  have  the  highest  value  of  the  likelihood  ratio 
(4.21)  since  for  a  given  contribution  to  the  integral  in  (4.29)  these 
points  make  a  maximum  contribution  to  the  integral  in  (4.30).  A  formal 
proof  of  this  assertion  is  easily  given,  using  the  fundamental  lemma  of 


Neyman  and  Pearson. 


Example  4-B. 

Detection  of  a  specified  signal  in  normal  noise.  Consider  the  signal 
and  noise  processes  described  in  example  4A.  Assume  that  the  covariance 
matrix 


K  = 


K(t^,t^)  ...  K(t^,tn) 


K(t 


,  ,t,  )  ...  K(t  ,t  ) 
n'  1'  v  n*  n^j 


is  non-singular  with  inverse  denoted 


K 


-1 


K’1(Vtn) 


K'1<Vt1)  •••  rl(Wj 


In  words,  K-1(ti,tj)  does  not  denote  the  reciprocal  of  K(t^,tj),  but 
rather  denotes  the  (i,j)-th  element  of  the  inverse  matrix  K  1.  The 
signal  and  noise  process  and  the  noise  process  then  both  possess  probability- 
density  functionals  pg+N  and  pN  with  respect  to  Lebesgue  measure  Q: 

_1  r  n 

P^X(t),  t  e  t)  =  |(2n)n  |k||  2  exp  -\  ^  ^  X(t±)  X(tj) 

PS+n(X(t)'  *  *  T)  ={(2*)n  |K|} 

i  |i  Cx(t±)-s(t1)}  K”:L(ti,tJ){x<tJ)-s(tJ)jj 


exp 
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where  |g|  is  the  determinant  of  the  matrix  K. 
The  likelihood  ratio  is  given  by 


B  X(t .)  K-1(t  ,t  )  S(t  ) 
i,j=l  1  J  J 


■  £  ±  8<VJ  • 

In  order  to  write  this  expression  more  compactly,  let  us  introduce  the 
notation  (which  will  play  an  important  role  in  the  sequel) 


(l\g)K  =  S 


defined  for  any  two  functions 


f(ti)  )  g(tj)  , 

f  and  g  on  T.  Then 


=  exp 


Since  the  likelihood  ratio  is  a  monotone  increasing  function  of  (X,S)^, 
it  follows  that  (X,S)  is  an  optimum  (Bayes  or  Neyman-Pearson)  detector. 

Zv 

Indeed,  the  rejection  region  for  testing  against  can  be  expressed 

as  the  set  of  observations  (X(t),  t  e  T)  for  which  (X,S)V  is  above  a 

Iv 

certain  threshold  A^,  say.  A  detector  of  the  form 

(X,S)K  .  Ktj)  8(t4)} 


is  said  to  be  a  "correlation  detector"  or  a  "matched  filter"  since  it  is 
obtained  by  "correlating"  or  "matching"  the  specified  signal  shape  S(t) 


with  the  observed  time  series  X(t). 

3^ 
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Exercises 


1.  In  the  notation  defined  in  Example  2A,  show  that  f^( • )  does  not 
belong  to  C,  while  fg(*)  and  fj(’)  do  belong  to  C. 

2.  Let  Hq  and  1^  be  simple  hypotheses  about  a  time  aeries 

(X(t),  0  <  t  <  l).  Under  both  hypotheses,  (X(t),  0  <  t  <  l)  is  a 
normal  process  with  covariance  kernel 

K(s,t)  -  «p(s)  <p(t) 

and  mean  value  function  respectively  given  by 

Hq:  E[X(t)]  h  o  , 

E[X(t) ]  =  S(t)  . 

Assume  that  S(  • )  and  <p(  • )  are  orthogonal, 

f  S(t)  <|>(t)  dt  *  0  . 

Jo 

Show  that  Hq  and  are  perfectly  detectable. 

1 

Hint:  U  =  /  S(t)  X(t)  dt  is  a  perfect  detector,  since 
0 

1  2 

P,[u  =  0]  -  1,  P  [U  =  /  S*(t)  dt  /  0]  -  1. 
d  o 
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3-  Let  Hq  end  be  single  hypotheses  shout  a  tin  scries 

{X(t),  0  <  t  <  1}.  Under  both  hypotheses,  (X(t),  0  <  t  <  T f  if 
of  the  form 

X(t)  =  t)  <p(t) 

where  cp(t)  is  a  non-random  function  and  t)  is  a  normal  random 
variable, 


Under  E[^]  ■  0  ,  Var[Tj]  =  1 


Under 


H^:  E[tj]  =  m  ,  Varfrj]  ■  1 


Show  that  an  optimum  detector  for  testing  against  is 
given  by 

\2 


exp 


=  exp 


jmrj  -  |  m2  - 


-!{("-»)  -■>2} 

and  therefore  is  given  by  T).  A  possible  formula  for  tj  is 

X(\) 

n  = 

for  any  point  t^  in  0  <  t^  <  1  such  that  q>( t^ )  /  0. 
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